Compilation techniques for high-performance embedded systems with multiple processors
نویسنده
چکیده
Despite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications.
منابع مشابه
Complexities in DSP Software Compilation: Performance, Code Size Power, Retargetability
This paper presents a new methodology for software compilation for embedded DSP systems. Although it is well known that conventional compilation techniques do not produce high quality DSP code, few researchers have addressed this area. Performance, estimated power dissipation, and code size are important design constraints in embedded DSP design. New techniques for code generation targeting DSP...
متن کاملTime-Predictable Java Dynamic Compilation on Multicore Processors
Java has been increasingly used in programming for real-time systems. However, some of Java’s features such as automatic memory management and dynamic compilation are harmful to time predictability. If these problems are not solved properly then it can fundamentally limit the usage of Java for real-time systems, especially for hard real-time systems that require very high time predictability. I...
متن کاملSoftware Level Power Consumption Models and Power Saving Techniques for Embedded DSP Processors
Unlike DSP compilation for high performance, research for low power optimisation has received little attention, although power dissipation is a critical issue for mobile devices. This paper presents an overview of power consumption models and power saving techniques for embedded DSP processors applications and evaluates their application to the Texas Instruments TMS320VC5510 Digital Signal Proc...
متن کاملEfficient Block Scheduling to Minimize Context Switching Time for Programmable Embedded Processors
Scheduling is one of the most often addressed optimization problems in DSP compilation, behavioral synthesis, and system-level synthesis research. With the rapid pace of changes in modern DSP applications requirements and implementation technologies, however, new types of scheduling challenges arise. This paper is concerned with the problem of scheduling blocks of computations in order to optim...
متن کاملOptimizing Java Bytecode for Embedded Systems
Modern Java Virtual Machines (JVM) for desktop and server computers use just-in-time (JIT) compilation to increase their performance. For embedded Java processors, JIT usually is not feasable. Therefore the java bytecode needs to be optimized for a specific platform ahead-of-time. To generate optimized bytecode for the JOP Java processor several existing tools were compared. In order to impleme...
متن کامل